102 research outputs found

    Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition

    Full text link
    Computers can understand and then engage with people in an emotionally intelligent way thanks to speech-emotion recognition (SER). However, the performance of SER in cross-corpus and real-world live data feed scenarios can be significantly improved. The inability to adapt an existing model to a new domain is one of the shortcomings of SER methods. To address this challenge, researchers have developed domain adaptation techniques that transfer knowledge learnt by a model across the domain. Although existing domain adaptation techniques have improved performances across domains, they can be improved to adapt to a real-world live data feed situation where a model can self-tune while deployed. In this paper, we present a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained model to a real-world live data feed setting while interacting with the environment and collecting continual feedback. RL-DA is evaluated on SER tasks, including cross-corpus and cross-language domain adaption schema. Evaluation results show that in a live data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in cross-corpus and cross-language scenarios, respectively

    Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

    Full text link
    Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on utilising adversarial methods to learn domain generalised representation for improving cross-corpus and cross-language SER to address this issue. However, many of these methods only focus on cross-corpus SER without addressing the cross-language SER performance degradation due to a larger domain gap between source and target language data. This contribution proposes an adversarial dual discriminator (ADDi) network that uses the three-players adversarial game to learn generalised representations without requiring any target data labels. We also introduce a self-supervised ADDi (sADDi) network that utilises self-supervised pre-training with unlabelled data. We propose synthetic data generation as a pretext task in sADDi, enabling the network to produce emotionally discriminative and domain invariant representations and providing complementary synthetic data to augment the system. The proposed model is rigorously evaluated using five publicly available datasets in three languages and compared with multiple studies on cross-corpus and cross-language SER. Experimental results demonstrate that the proposed model achieves improved performance compared to the state-of-the-art methods.Comment: Accepted in IEEE Transactions on Affective Computin

    Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

    Full text link
    Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary tasks. The semi-supervised nature of MTL-AUG allows for the exploitation of the abundant unlabelled data to further boost the performance of SER. We comprehensively evaluate the proposed framework in the following settings: (1) within corpus, (2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB datasets show improved results compared to existing state-of-the-art methods.Comment: Under review IEEE Transactions on Affective Computin

    HER-2 Immunohistochemical Expression in Bone Sarcomas: A New Hope for Osteosarcoma Patients

    Get PDF
    BACKGROUND: Osteosarcoma and chondrosarcoma, remain the most common primary bone tumours. Questions have been raised about the prognostic influence of HER-2 in bone sarcomas, but so far the results have been debatable. The her-2 expression is possibly a predictor of chemotherapy response.AIM: In this study, we investigated the extent of HER-2 expression in bone sarcomas, and attempted to correlate it with pertinent variables that will help to provide better treatment options, especially for metastatic ones.MATERIAL AND METHODS: Fifty-two cases of bone sarcomas (32 osteosarcoma cases and 20 chondrosarcoma ones) were studied for HER-2 immunohistochemical expression then correlation with all available clinicopathologic features was done.RESULTS: Most of the osteosarcoma cases exhibited membranous staining (78.1%). Strong staining was observed (score 3+) in 34.4%; while 21.9% showed moderate staining (score 2+); and 21.9% displayed weak staining (score 1+), on the other hand, no staining was detected in 7 out of 32 cases (21.9%) (score 0). As regards chondrosarcoma, the absence of staining in all examined cases was noted. Immunohistochemical HER-2 overexpression correlated significantly with osteosarcoma site with P value = 0.004, with variation relating HER-2 intensity score to the site of osteosarcoma (P = 0.051). A statistically significant negative correlation was detected between HER-2 expression and the presence of metastasis at time of diagnosis (P = 0.006), A significant correlation was also found regarding HER-2 score and presence of metastasis with P value = 0.046 as more than half of cases with no metastasis at diagnosis (17/28 cases, 60.7%) showed positive intensity score. A statistically significant correlation was detected between HER-2 expression and patients’ age (P = 0.044). Also, HER-2 expression significantly correlated to histopathological detection of fibrous tissue, with P value = 0.033. Higher scores of HER-2 expression were associated with a significantly better differentiation (P = 0.038) since detection of wide areas of osteoid were associated with higher HER-2 scores.CONCLUSION: Further research would still be needed to delineate HER-2 role being a new hope for therapeutic targeting in bone sarcoma patients, mainly osteosarcoma in contrast to chondrosarcoma that didn’t express HER-2 at all

    Towards Optimal Kinetic Energy Harvesting for the Batteryless IoT

    Full text link
    Traditional Internet of Things (IoT) sensors rely on batteries that need to be replaced or recharged frequently which impedes their pervasive deployment. A promising alternative is to employ energy harvesters that convert the environmental energy into electrical energy. Kinetic Energy Harvesting (KEH) converts the ambient motion/vibration energy into electrical energy to power the IoT sensor nodes. However, most previous works employ KEH without dynamically tracking the optimal operating point of the transducer for maximum power output. In this paper, we systematically analyse the relation between the operating point of the transducer and the corresponding energy yield. To this end, we explore the voltage-current characteristics of the KEH transducer to find its Maximum Power Point (MPP). We show how this operating point can be approximated in a practical energy harvesting circuit. We design two hardware circuit prototypes to evaluate the performance of the proposed mechanism and analyse the harvested energy using a precise load shaker under a wide set of controlled conditions typically found in human-centric applications. We analyse the dynamic current-voltage characteristics and specify the relation between the MPP sampling rate and harvesting efficiency which outlines the need for dynamic MPP tracking. The results show that the proposed energy harvesting mechanism outperforms the conventional method in terms of generated power and offers at least one order of magnitude higher power than the latter

    Survey of deep representation learning for speech emotion recognition

    Get PDF
    Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual eort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated \textit{deep representation learning} where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER

    A survey on video segmentation for real-time applications

    Get PDF
    Video object segmentation is to extract moving and static objects from consecutive video frames. It is a prerequisite for visual content retrieval (e.g., MPEG-7 related schemes), objectbased compression and coding (e.g., MPEG-4 codecs), object recognition, object tracking, security video surveillance, traffic monitoring for law enforcement, and many other application
    corecore